Visual Data Mining of RNA Secondary Structure and Folding Pathways as Determined by the Massively Parallel Genetic Algorithm
نویسندگان
چکیده
RNA folding pathways are proving to be quite important in the determination of RNA function. Studies indicate that RNA may enter intermediate and multiple conformational states that are key to its functionality. These states may have a significant impact on gene expression and molecular function. It is known that the biologically functional states of RNA molecules may not correspond to their minimum energy state, that kinetic barriers may exist that trap the molecule in a local minimum, that folding often occurs during transcription, and that cases exist in which a molecule will transition between one or more functional conformations. Thus, methods for simulating the folding pathway and dynamic behavior of an RNA molecule are important for the prediction of RNA structure and its associated functions. We have developed several visual data mining techniques associated with a massively parallel genetic algorithm for RNA structure prediction, as well as with STRUCTURELAB, our RNA/DNA structure analysis workbench. These methodologies are used to determine the significant intermediate and final structures associated with RNA folding. Since the genetic algorithm is essentially stochastic, multiple runs are required. The visualization procedures used give significant feedback concerning the characteristics of the folding runs. This feedback encompasses: interpretation of results from individual genetic algorithm runs that are based on population consensus or best fit structures, this includes the discovery of transition states in the folding process; final results of individual runs; and the interpretation of genetic algorithm results from multiple RNA sequences from the same family to identify common structural elements across the different sequences. In addition, fitness maps as well as results derived from different population sizes are used. The combination of the visualization techniques as well as other methodologies embedded within the STRUCTURELAB and genetic algorithm environments help to determine the overall picture representing the folding pathway or final structure(s) of a given RNA sequence. This paper will describe several of these techniques and show how they are used to help solve this very highly combinatoric problem.
منابع مشابه
Determination of RNA folding pathway functional intermediates using a massively parallel genetic algorithm (abstract of invited talk)
RNA folding pathways are proving to be quite important in the determination of RNA function. Studies indicate that RNA may enter intermediate conformational states that are key to its functionality. These states may have a significant impact on gene expression. It is known that the biologically functional states of RNA molecules may not correspond to their minimum energy state. Specifically, ki...
متن کاملRelation Between RNA Sequences, Structures, and Shapes via Variation Networks
Background: RNA plays key role in many aspects of biological processes and its tertiary structure is critical for its biological function. RNA secondary structure represents various significant portions of RNA tertiary structure. Since the biological function of RNA is concluded indirectly from its primary structure, it would be important to analyze the relations between the RNA sequences and t...
متن کاملThe massively parallel genetic algorithm for RNA folding: MIMD implementation and population variation
A massively parallel Genetic Algorithm (GA) has been applied to RNA sequence folding on three different computer architectures. The GA, an evolution-like algorithm that is applied to a large population of RNA structures based on a pool of helical stems derived from an RNA sequence, evolves this population in parallel. The algorithm was originally designed and developed for a 16384 processor SIM...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کامل